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Abstract 

Grade information has been considered in Yuan et al. (2007) wherein they proposed a Quasi-CRM method to incorporate 
the grade toxicity information in phase I trials. A potential problem with the Quasi-CRM model is that the choice of skeleton 
may dramatically vary the performance of the CRM model, which results in similar consequences for the Quasi-CRM model. 
In this paper, we propose a new model by utilizing bayesian model selection approach - Robust Quasi-CRM model - to 
tackle the above-mentioned pitfall with the Quasi-CRM model. The Robust Quasi-CRM model literally inherits the BMA-CRM 
model proposed by Yin and Yuan (2009) to consider a parallel of skeletons for Quasi-CRM. The superior performance of 
Robust Quasi-CRM model was demonstrated by extensive simulation studies. We conclude that the proposed method can 
be freely used in real practice. 
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Introduction 

The primary goal of a phase I clinical trial of a new oncologic 
agent is to find a dose with acceptable toxicity, that is, to target the 
maximum tolerated dose (MTD). In practice, the MTD is often 
defined as the dose of the drug that will produce a defined dose- 
limiting toxicity(DLT) in a pre-specified percentage of patients. 
Toxicity level is often categorized into multiple grades. For 
instance, the general guidelines of the Common Toxicity Criteria 
(CTC) (National Cancer Institute, 2003) are grade 0 for no toxicity; 
grade 1,2,3,4 and 5 for minimal toxicity, moderate toxicity, severe 
toxicity, life threatening and death, respectively. This comprehen- 
sive toxicity grading scale is well-established and adopted in 
clinical practices, which indicates that a binary response may 
inappropriately ignore kinds of levels of toxicity severity. However, 
in most dominating dose allocation approaches, such as the 
traditional "3+3" design [1], CRM design [2] and recently 
proposed mTPI design [3], these grades are dichotomized. For 
example, if grade 4 fatigue is considered DLT then grades 0-3 will 
be non-DLT and treated almost equally from the point of view of 
a clinical trial design. It is known that such dichotomization works 
well for moderate toxicities. Nevertheless, for severe and possibly 
irreversible effects, such as renal, liver, or neurological toxicities, 
grade 4 renal toxicity is much more severe than that of grade 3. 



Therefore, those toxicity grades should not be treated indiscrim- 
inately. In addition, given that Phase I trials are typically small in 
size, utilizing as much information as possible for decision making 
is important. Using only partial toxicity information could be 
inefficient. More appropriate methods need to be used to consider 
this issue in the dose escalation procedure. 

In the literature, there have been some proposals for considering 
this issue. Bekele and Thall (2004) [4] (BT method for short) 
applied severity weights to a soft tissue sarcoma trial with five types 
of DLTs. Each observed patients was assigned by physicians to a 
severity weight on a common numeric scale for each type of 
toxicity, and the sum of these weights over the five toxicity types 
was called the total toxicity burden (TTB). The authors then 
considered a hypothetical collection of cohorts with a variety of 
different possible outcomes. Yuan, Chappell and Bailey (2007) [5] 
(their proposed method is named as Quasi-CRM) also used 
severity weights to convert toxicity grades to numerical 
scores. They proposed a Quasi-CRM approach to incorporate 
these scores into the CRM. The recommended dose for the 
next patient is the dose level with estimated score (the equivalent 
toxicity (ET) score) closest to the target score, obtained from a pre- 
specified toxicity profile at the MTD. This Quasi-CRM 
method has been demonstrated to be superior to the BT 
method in recommendation percentage of optimal dose for 
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further studies. Meter, Garrett-Mayer and Bandyopadhyay (2012) 
[6] incorporated toxicity grades using a continuation ratio (CR) 
model in the likelihood-based CRM. They demonstrated that the 
proposed method was better than that of dichotomous CRM 
counterpart. 

In 2009, Yin and Yuan [7] proposed using multiple 
parallel CRM models, each with a different set of pre- 
specified toxicity probabilities. In the Bayesian paradigm, they 
assign a discrete probability mass to each CRM model as the 
prior model probability. The posterior probabilities of toxicity 
can be estimated by the Bayesian model averaging (BMA) 
approach. Dose escalation or de-escalation is then determined 
by comparing the target toxicity rate and the BMA estimates of the 
dose toxicity probabilities. Yin and Yuan examine the properties 
of the BMA-CRM approach through extensive simulation studies, 
and also compare this new method and its variants with the 
original CRM. The results demonstrate that the BMA-CRM is 
competitive and robust, and eliminates the arbitrariness of the pre- 
specification of toxicity probabilities. However, the BMA-CRM 
approach does not take the multiple toxicity grade level into 
account. 

Although the Quasi-CRM method has good statistical perfor- 
mance, as in the CRM paradigm, the Quasi-CRM method only 
uses a pre-specified skeleton for the estimation of parameters, 
which could induce unstable estimators according to Yin and 
Yuan (2009). In this article we inherit the essence of BMA-CRM 
approach to incorporate it into the Quasi-CRM paradigm. We 
call our proposed design Robust Quasi-CRM. Numeric compar- 
isons of Robust Quasi-CRM, Quasi-CRM are described in 
Section 3, followed by a conclusion in Section 4. 

The Method 

1. Equivalent Toxicity Score 

The "equivalent toxicity (ET) score" was proposed by Yuan 
et al.(2007) to measure the relative severity of different toxicity 
grades in the dose allocation procedure, where grade 3 toxicities 
are assigned a value of 1, grades 2 are assigned to 0.5, and grades 4 
are assigned to 1.5. They consider an ET score equal to 1 as the 
cutoff grade for DLT. By introducing the concept of ET score, the 
commonly used MTD definition will be modified to incorporate 
grade information. A new MTD is defined as the dose of the drug 
with ET score equal to the target ET score, computed from a pre- 
specified toxicity grade profile at the MTD. Please also refer to 
Bekele and ThaU (2004) for details. 

2. Quasi-CRM 

The quasi-likelihood function is constructed using a family of 
probability distributions that may not contain the true distribution. 
Estimators obtained by maximizing the quasi-likelihood function 
are called quasi maximum likelihood estimates (QMLEs). Under 
some regularity conditions, QMLEs are strongly consistent if the 
"quasi" distributions belong to linear exponential families such as 
the binomial family (Gourierox, Monfort, and Trognon, 1984; 
McCullagh and Nelder, 1989) [8] [9]. 

Suppose that n patients have been tested sequentially at 
dose levels d(l), d(2), • • • , d(n) with corresponding ET scores 
s(l), s(2), • ■ ■ , s(n). Define the normalized ET scores as 

s*(i) = s(i)/s max ,i= 1,2, • • • , n 

where s„ mx < °°- Thus S*(i)e[0,l],i= 1,2, The normalized 

scores can be viewed as fractional events and modeled using the 
quasi-Bernoulli likelihood (Papke and Woodbrige, 1996) [10]. 



Obviously, their true distributions are not Bernoulli, which only 
takes values of 0 or 1. However, if the dose-toxicity model is 
correctly specifed, the QMLE will be strongly consistent because 
the Bernoulli distribution belongs to the binomial family. 

Assume that the true dose-normalized ET score relationship is 
given by p*(d) = E(s*\d). Consider J dose levels {di,d 2 , ■ ■ ■ , dj}. 
Denote p*=p*(dj),j = 1,2, k. The goal is to find the MTD d 0 
that is the highest level such that pl(do)<po/s max , where po is the 
target ET score. Assume that the last patient is tested at level 
d(n) = dj with normalized ET score s*(n). Then its contribution to 
the quasi-Bernoulli likelihood will be 

nd(n),s*(n)} = (j>*f("\l-p]y- s * ( " ) 

The quasi-Bernoulli likelihood will be updated by 
\\j/{d(n), s*(n)}, and {p*}\<j<j can be estimated 
accordingly. Note that if a functional dose-score curve is not 
assumed, the QMILE Pj = S max p*,,j= 1,2, • • • , n, equals to the 
observed average ET score at each dose level. 

The quasi-Bernoulli likelihood provides a simple way to 
incorporate ordinal grades into parametric models. Yuan 
et al.(2007) successfully used it with the CRM in developing the 
Quasi-CRM. 

3. Bayesian Model Selection Method 

As pointed out by Yin and Yuan (2009), a major issue associated 
with the CRM is that pre-specification of the toxicity probabilities 
(p\,PZ, • • ■ , Pj) is arbitrary. If the PjS deviate far from the true 
dose-toxicity curve, this may lead to poor operating characteristics 
and a high probability of selecting the wrong dose as the MTD. To 
avoid subjectivity in specifying the skeleton, they proposed pre- 
specifying multiple skeletons, each representing a set of prior 
estimates of the toxicity probabilities. During the trial, conditional 
on the observed data, these different models usually yield different 
estimates of the toxicity probabilities (jfci, ■ ■ ■ , ftj). Some of these 
estimates may be close to the true values, whereas others may not, 
depending on how well the models fit the accumulated data. To 
accommodate the uncertainty in the specification of these 
skeletons, Yin and Yuan (2009) took a BMA approach to average 
kj across the CRM models to obtain the BMA estimate of the 
toxicity probability for dose level j. BMA is known to provide a 
better predictive performance than any single model (Raftery, 
Madigan, and Hoeting 1997; Hoeting et al.1999) [11] [12]. 

Specifically, let (M\, ■ ■ ■ , Mg) be the models corresponding to 
each set of prior guesses of the toxicity probabilities 
{(p\U - ■ ■ , Pu), - ■ ■ ,(Pku - ■ ■ , Pkj)}- Model M k (k= 1, • • • , K) in 
the CRM is given by 

which is based on the kth skeleton (j>k\ , ■ ■ ■ , pia)- Let pr(Mk) be 
the prior probability that model Mk is the true model; that is, the 
probability that the kth skeleton (pki, ■ ■ ■ , Pkj) matches the true 
dose-toxicity curve. If there is no preference a priori for any single 
model in the CRM case, then one can assign equal weights to the 
different skeletons by simply setting pr(Mk)=l/K. At a certain 
stage of the trial, based on the observed data 
D = {(rij, yj),j= 1, • • • , J}, the likelihood function under model 
M/ t is 
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L(D\a k ,M k )- 



./ = ' 



~Pkj 1 



The posterior model probability for M k is given by 



pr(M k \D) = WMWM 
1 

where L(D\M k ) is the marginal likelihood of model M k , 



L{D\M k )-- 



L(D\v. k ,M k )f(ct. k \M k )doi. k , %k is the power parame- 



ter in the CRM associated with model M k , and f(a. k \Mk) is the 
prior distribution of a k under model M k . 

The posterior model probability can be naturally linked to the 
Bayes factor, which also consequently resulting in the Bayesian 
model selection approach. The Bayes factor, B\§, for model M\ 
against another model Mo given data D is defined as the ratio of 

pr{D\M x ) 

' pr{D\M Q Y 

marginal likelihood, i.e., pr(D\M k ) = L(D\M k ). We can construct 
such Bayes factors for each of the models (Mi, ■ ■ ■ ,Mk) against 
Mo, denoted by (-Bio, • ■ ■ ,Bkq)- Then the posterior model 
probability of M/ f is 



posterior to prior odds, B\q - 



- , which is the ratio of the 



pr(M k \D)- 



>1k B kO 



where r] k =pr(Mk)/pr(Mo) is the prior odds for M k against Mo, 
k=\,---,K. 

The Bayesian model selection approach can be used to estimate 
the toxicity probabilities and make the decision of dose assignment. 
Specifically, at each point of decision making for dose assignment, 
we select the model with the highest posterior probability, i.e., model 

k* = argmax ke \ . .., K (pr(M K \ D)) 

and use that model to make inference and dose assignment. 

Unlike the Quasi-CRM, our proposed robust version 
pre-specifies a parallel of K different skeletons, 
{(Pll,--- ,Plj),--- ,(PKlr ■■ ,Pkj)}- Then after n patients, the 
quasi-posterior estimation of the toxicity probability for dose j 
under the kth skeleton will be updated by 



n kj '■ 



exn(ot) 
Pkf 



iP(D\a k ,Mk)f(xk\M k ) 
f \ji(D\a k ,M k )f(a k \M k )da k 



da k 



likelihood 



here we use the quasi-Bernoulli 
4,{d(n),s*(n)} = (R*f (n Xl -X*?- s * ( *\ 

According to the BMA-CRM approach, in our proposed 
method, we also add a stopping rule in our algorithm, that is, if 
pr(toxicity rate at d\ >(j))> 0.9, then the trial is terminated 
for safety. We give our proposed approach the name Robust Quasi- 
likelihood approach in later section. Here, we require early 
termination of a trial if the lowest dose is too toxic, as noted by 



pr(n k * (a k * ) >cp\M k * ,D) > 90% 



Simulation studies 

1. Simulation settings 

We investigated the operating characteristics of the proposed 
Robust Quasi-likelihood approach through simulation studies under 
eight different toxicity scenarios. Table 1, the same as in Yuan 
et al., gave the probability configurations for grades 0-4 in each 
scenario. We considered six dose levels and assumed that toxicity 
increased monotonically with respect to the dose. We prepared 
three sets of initial guesses of the toxicity probabilities: 

((0.11,0.25,0.40,0.55,0.75,0.85) skeletonl 
(0.05,0.06,0.10,0.12,0.15,0.20) skeleton2 
(0.20,0.40,0.60,0.75,0.85,0.95) skeleton3 



The first skeleton started at a relatively moderate toxicity 
probability of 0.11 and increased quickly at the high toxicity 
probability of 0.85. The second skeleton was for the case in which 
toxicity increased slowly at the low doses but increased quickly at 
the moderate doses; the highest dose had a toxicity probability of 
0.20. The toxicity probabilities in the third skeleton were scattered 
evenly over a range of 0.2 to 0.95. Thus these three sets of 
skeletons represented three different prior opinions on the true 
dose-toxicity curve. We refered to the individual Quasi-CRMs 
using each of these three skeletons as Quasi-CRM 1, Quasi-CRM 
2, and Quasi-CRM 3. 

In Table 2, under each scenario we listed the true toxicity 
probabilities in the first row, the corresponding ET scores in the 
second row, the percentages of MTD dose being correcdy 
identified and the average numbers of patients treated at each 
dose separately for the Quasi-CRM using each of the three 
skeletons in rows 3-8, the results obtained using the proposed 
Robust Quasi-CRM in rows 9-10. 

In the simulations, the target ET score was 0.47, 
which is equivalent to DLT probability of 0.33. That is, if 
we consider the following toxicity profile: 49% grade 0 and 
grade 1, 18% grade 2, 23% grade 3, and 10% grade 4, then 
the target ET score was obtained by computing the 
weighted sum of ET scores over all grades (i.e., 
R 0 = 0.49 x 0 + 0.18 x 0.5 + 0.23 x 1.0 + 0.10 x 1.5 = 0.47.) All 
simulations began at the lowest dose and cohorts of one were 
treated at each stage. Dose escalation was restricted to the next 
higher pre-specified dose only. Each scenario was simulated 
1 ,000 times with a maximum sample size of 20. 

2. Simulation results 

In scenario A, the fourth dose was the desirable dose, and the 
three individual Quais-CRMs using different skeletons selected the 
targeted ET score with very different probabilities. In particular, 
the proposed Robust Quasi-CRM correctly identified the MTD 
46.6% of the time. Quasi-CRM 1 performed the best, correcdy 
identifying the MTD 49.3% of the time and the Quasi-CRM 2 
performed the worst, only correctly identifying the MTD 36.4% of 
the time. In this case, while the Quasi-CRM design was slightly 
better than the Robust Quasi-CRM, they were very comparable 
both in terms of correctly identifying the MTD as well as with 
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respect to the number of subjects who were treated above the 
MTD. 

Scenario B had the MTD at the fourth dose level, and the MTD 
selection percentage using the Robust Quasi-CRM was the second 
best among the four designs. The worst skeleton corresponds to 
Quasi-CRM 2, only correctly identifying the MTD 40.3% of the 
time, whereas the proposed design correctly identified the MTD 
56.9% of the time. In scenario C, the sixth dose was the MTD. 
Quasi-CRM 3 performed the worse in this scenario, with a MTD 
being correctiy identified almost 50% lower than those of the 
others. In this case, Quasi-CRM l's performance was also inferior 
to that of the proposed Robust-CRM method. 

In Scenario D, the first dose is the MTD. Skeleton 1 correctiy 
identified the MTD 70% of the time, while the proposed Robust- 
CRM correctly identified the MTD 73.1% of the time. With 
respect to the number of patients assigned to the above target ET 
score, our Robust design is the second best. Scenario E is similar to 
scenario A. In scenario F, all of the percentages of the MTD being 
correctly identified by using different designs were quite close, 
except the Quasi-CRM 2 has assigned more patients to the above 
target ET score than others. In scenarios G and H, again the 
proposed Robust Quasi-CRM was very robust, with a MTD 
selection percentage always close to that of the best-conducted 
Quasi-CRM. 

These findings demonstrate that the skeleton indeed plays a 
critical role in the Quasi-CRM design. There was a difference of 
> 55% in the MTD selection probability when using different 
skeletons in scenario E. However, our Robust Quasi-CRM 
performed the second best, with MTD being identified around 
90.2% of the time. 

Based on these simulations, we conclude that the proposed 
Robust Quasi-CRM method are quite robust in terms of dose 
selection probabilities and average number of patients treated at 
the MTD level. These methods typically cannot perform 
as well as the best single Quasi-CRM in the skeletons 
set, but their performance is always quite close to that of 
the best single Quasi-CRM and can be much better than 
that of the worst single Quasi-CRM. The proposed method 
carries the essence of the BMA-CRM proposed by Yin and Yuan 
(2009) by adaptively balancing among competing models, and thus 
offers more reliable and robust estimates for the toxicity 
probabilities. 

Conclusion 

In this paper we proposed the robust version of Quasi-CRM to 
model toxicity grades, and demonstrated by simulation that it is 
superior to the single skeleton version of Quasi-CRM. As pointed 
by Yuan et al.(2007), the Quasi-CRM is most useful when DLTs 
are severe, possibly irreversible, or have a long duration. 

The performance of the proposed designs can be substantially 
improved over that of the original Quasi-CRM if the skeleton in 
the CRM happens to be very far from the true model. The Robust 
Quasi-CRM method is straightforward to implement and to 
compute easily based on the Gaussian quadrature approximation 
or the Markov Chain Monte Carlo procedure. Our method 
requires specifying multiple skeletons to cover different potential 
scenarios for the underlying dose-toxicity curve. It provides a nice 
compromise for the initial guesses of toxicity probabilities from 
different physicians. If one skeleton corresponds to the true toxicity 
probabilities, then the Robust Quasi-CRM would perform very 
well, because it often performs similarly to the best-performing 
Quasi-CRM. This Bayesian model-averaging procedure dramat- 
ically improves the robustness of the Quasi-CRM. As shown in the 
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simulations, a certain skeleton often yields under-performing 
results; however, simultaneously specifying multiple skeletons 
reduces the likelihood of all sets of toxicity probabilities leading 
to a poorly performing Quasi-CRM design. The arbitrariness in 
the specification of the skeleton is eliminated by incorporating the 
uncertainties associated with each skeleton into the Bayesian 
model-averaging procedure. 

In our simulations we used a cohort size of one; however, cohort 
size of two or three also could be used. Our setup is based on the 
improved versions of the Quasi-CRM to optimize its practical 
performance. As an extension of the Quasi-CRM, the Robust 
Quasi-CRM makes this trial design more widely applicable and 
reliable for phase I clinical trials. 
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